!pip install azureml.opendatasets
Collecting azureml.opendatasets
Downloading azureml_opendatasets-1.41.0-py3-none-any.whl (1.3 MB)
|████████████████████████████████| 1.3 MB 5.3 MB/s
Requirement already satisfied: numpy<=2.0.0,>=1.16.0 in /usr/local/lib/python3.7/dist-packages (from azureml.opendatasets) (1.21.6)
Collecting azureml-core~=1.41.0
Downloading azureml_core-1.41.0.post3-py3-none-any.whl (2.7 MB)
|████████████████████████████████| 2.7 MB 29.0 MB/s
Collecting pyspark
Downloading pyspark-3.2.1.tar.gz (281.4 MB)
|████████████████████████████████| 281.4 MB 36 kB/s
Collecting azureml-dataset-runtime[fuse,pandas]~=1.41.0
Downloading azureml_dataset_runtime-1.41.0-py3-none-any.whl (3.5 kB)
Requirement already satisfied: pandas<=2.0.0,>=0.21.0 in /usr/local/lib/python3.7/dist-packages (from azureml.opendatasets) (1.3.5)
Requirement already satisfied: pyarrow>=0.16.0 in /usr/local/lib/python3.7/dist-packages (from azureml.opendatasets) (6.0.1)
Requirement already satisfied: scipy<=2.0.0,>=1.0.0 in /usr/local/lib/python3.7/dist-packages (from azureml.opendatasets) (1.4.1)
Collecting azureml-telemetry~=1.41.0
Downloading azureml_telemetry-1.41.0-py3-none-any.whl (31 kB)
Collecting pyopenssl<23.0.0
Downloading pyOpenSSL-22.0.0-py2.py3-none-any.whl (55 kB)
|████████████████████████████████| 55 kB 4.1 MB/s
Collecting pkginfo
Downloading pkginfo-1.8.2-py2.py3-none-any.whl (26 kB)
Collecting ndg-httpsclient<=0.5.1
Downloading ndg_httpsclient-0.5.1-py3-none-any.whl (34 kB)
Collecting azure-core<=1.22.1
Downloading azure_core-1.22.1-py3-none-any.whl (178 kB)
|████████████████████████████████| 178 kB 30.2 MB/s
Requirement already satisfied: requests[socks]<3.0.0,>=2.19.1 in /usr/local/lib/python3.7/dist-packages (from azureml-core~=1.41.0->azureml.opendatasets) (2.23.0)
Collecting msal<2.0.0,>=1.15.0
Downloading msal-1.17.0-py2.py3-none-any.whl (79 kB)
|████████████████████████████████| 79 kB 8.1 MB/s
Requirement already satisfied: python-dateutil<3.0.0,>=2.7.3 in /usr/local/lib/python3.7/dist-packages (from azureml-core~=1.41.0->azureml.opendatasets) (2.8.2)
Collecting jmespath<1.0.0
Downloading jmespath-0.10.0-py2.py3-none-any.whl (24 kB)
Collecting azure-mgmt-containerregistry<10,>=8.2.0
Downloading azure_mgmt_containerregistry-9.1.0-py3-none-any.whl (1.1 MB)
|████████████████████████████████| 1.1 MB 31.2 MB/s
Collecting PyJWT<3.0.0
Downloading PyJWT-2.4.0-py3-none-any.whl (18 kB)
Collecting jsonpickle<3.0.0
Downloading jsonpickle-2.2.0-py2.py3-none-any.whl (39 kB)
Requirement already satisfied: pytz in /usr/local/lib/python3.7/dist-packages (from azureml-core~=1.41.0->azureml.opendatasets) (2022.1)
Collecting argcomplete<3
Downloading argcomplete-2.0.0-py2.py3-none-any.whl (37 kB)
Collecting azure-mgmt-keyvault<10.0.0,>=0.40.0
Downloading azure_mgmt_keyvault-9.3.0-py2.py3-none-any.whl (412 kB)
|████████████████████████████████| 412 kB 25.7 MB/s
Collecting azure-graphrbac<1.0.0,>=0.40.0
Downloading azure_graphrbac-0.61.1-py2.py3-none-any.whl (141 kB)
|████████████████████████████████| 141 kB 24.6 MB/s
Collecting azure-mgmt-authorization<3,>=0.40.0
Downloading azure_mgmt_authorization-2.0.0-py2.py3-none-any.whl (465 kB)
|████████████████████████████████| 465 kB 25.7 MB/s
Collecting knack~=0.9.0
Downloading knack-0.9.0-py3-none-any.whl (59 kB)
|████████████████████████████████| 59 kB 6.3 MB/s
Collecting humanfriendly<11.0,>=4.7
Downloading humanfriendly-10.0-py2.py3-none-any.whl (86 kB)
|████████████████████████████████| 86 kB 4.9 MB/s
Collecting SecretStorage<4.0.0
Downloading SecretStorage-3.3.2-py3-none-any.whl (15 kB)
Collecting paramiko<3.0.0,>=2.0.8
Downloading paramiko-2.10.4-py2.py3-none-any.whl (212 kB)
|████████████████████████████████| 212 kB 23.6 MB/s
Collecting msal-extensions<0.4,>=0.3.0
Downloading msal_extensions-0.3.1-py2.py3-none-any.whl (18 kB)
Collecting azure-mgmt-storage<20.0.0,>=16.0.0
Downloading azure_mgmt_storage-19.1.0-py3-none-any.whl (1.8 MB)
|████████████████████████████████| 1.8 MB 23.3 MB/s
Requirement already satisfied: urllib3<=1.26.7,>=1.23 in /usr/local/lib/python3.7/dist-packages (from azureml-core~=1.41.0->azureml.opendatasets) (1.24.3)
Requirement already satisfied: contextlib2<22.0.0 in /usr/local/lib/python3.7/dist-packages (from azureml-core~=1.41.0->azureml.opendatasets) (0.5.5)
Collecting pathspec<1.0.0
Downloading pathspec-0.9.0-py2.py3-none-any.whl (31 kB)
Collecting azure-common<2.0.0,>=1.1.12
Downloading azure_common-1.1.28-py2.py3-none-any.whl (14 kB)
Collecting msrest<1.0.0,>=0.5.1
Downloading msrest-0.6.21-py2.py3-none-any.whl (85 kB)
|████████████████████████████████| 85 kB 3.9 MB/s
Requirement already satisfied: packaging<22.0,>=20.0 in /usr/local/lib/python3.7/dist-packages (from azureml-core~=1.41.0->azureml.opendatasets) (21.3)
Collecting azure-mgmt-resource<21.0.0,>=15.0.0
Downloading azure_mgmt_resource-20.1.0-py3-none-any.whl (2.3 MB)
|████████████████████████████████| 2.3 MB 30.2 MB/s
Collecting docker<6.0.0
Downloading docker-5.0.3-py2.py3-none-any.whl (146 kB)
|████████████████████████████████| 146 kB 24.5 MB/s
Collecting cryptography!=1.9,!=2.0.*,!=2.1.*,!=2.2.*,<37.0.0
Downloading cryptography-36.0.2-cp36-abi3-manylinux_2_24_x86_64.whl (3.6 MB)
|████████████████████████████████| 3.6 MB 27.0 MB/s
Collecting adal<=1.2.7,>=1.2.0
Downloading adal-1.2.7-py2.py3-none-any.whl (55 kB)
|████████████████████████████████| 55 kB 3.7 MB/s
Collecting msrestazure<=0.6.4,>=0.4.33
Downloading msrestazure-0.6.4-py2.py3-none-any.whl (40 kB)
|████████████████████████████████| 40 kB 3.1 MB/s
Collecting backports.tempfile
Downloading backports.tempfile-1.0-py2.py3-none-any.whl (4.4 kB)
Requirement already satisfied: importlib-metadata<5,>=0.23 in /usr/local/lib/python3.7/dist-packages (from argcomplete<3->azureml-core~=1.41.0->azureml.opendatasets) (4.11.3)
Requirement already satisfied: six>=1.11.0 in /usr/local/lib/python3.7/dist-packages (from azure-core<=1.22.1->azureml-core~=1.41.0->azureml.opendatasets) (1.15.0)
Collecting azure-mgmt-core<2.0.0,>=1.2.0
Downloading azure_mgmt_core-1.3.0-py2.py3-none-any.whl (25 kB)
Collecting pyarrow>=0.16.0
Downloading pyarrow-3.0.0-cp37-cp37m-manylinux2014_x86_64.whl (20.7 MB)
|████████████████████████████████| 20.7 MB 2.6 MB/s
Collecting azureml-dataprep<3.2.0a,>=3.1.0a
Downloading azureml_dataprep-3.1.3-py3-none-any.whl (38.6 MB)
|████████████████████████████████| 38.6 MB 2.0 MB/s
Collecting fusepy<4.0.0,>=3.0.1
Downloading fusepy-3.0.1.tar.gz (11 kB)
Collecting azure-identity==1.7.0
Downloading azure_identity-1.7.0-py2.py3-none-any.whl (129 kB)
|████████████████████████████████| 129 kB 37.3 MB/s
Collecting azureml-dataprep-native<39.0.0,>=38.0.0
Downloading azureml_dataprep_native-38.0.0-cp37-cp37m-manylinux1_x86_64.whl (1.3 MB)
|████████████████████████████████| 1.3 MB 46.7 MB/s
Requirement already satisfied: cloudpickle<3.0.0,>=1.1.0 in /usr/local/lib/python3.7/dist-packages (from azureml-dataprep<3.2.0a,>=3.1.0a->azureml-dataset-runtime[fuse,pandas]~=1.41.0->azureml.opendatasets) (1.3.0)
Collecting dotnetcore2<3.0.0,>=2.1.14
Downloading dotnetcore2-2.1.23-py3-none-manylinux1_x86_64.whl (29.3 MB)
|████████████████████████████████| 29.3 MB 12.9 MB/s
Collecting azureml-dataprep-rslex~=2.5.0dev0
Downloading azureml_dataprep_rslex-2.5.4-cp37-cp37m-manylinux2010_x86_64.whl (15.4 MB)
|████████████████████████████████| 15.4 MB 15.2 MB/s
Collecting applicationinsights
Downloading applicationinsights-0.11.10-py2.py3-none-any.whl (55 kB)
|████████████████████████████████| 55 kB 3.1 MB/s
Requirement already satisfied: cffi>=1.12 in /usr/local/lib/python3.7/dist-packages (from cryptography!=1.9,!=2.0.*,!=2.1.*,!=2.2.*,<37.0.0->azureml-core~=1.41.0->azureml.opendatasets) (1.15.0)
Requirement already satisfied: pycparser in /usr/local/lib/python3.7/dist-packages (from cffi>=1.12->cryptography!=1.9,!=2.0.*,!=2.1.*,!=2.2.*,<37.0.0->azureml-core~=1.41.0->azureml.opendatasets) (2.21)
Collecting websocket-client>=0.32.0
Downloading websocket_client-1.3.2-py3-none-any.whl (54 kB)
|████████████████████████████████| 54 kB 2.4 MB/s
Collecting distro>=1.2.0
Downloading distro-1.7.0-py3-none-any.whl (20 kB)
Requirement already satisfied: typing-extensions>=3.6.4 in /usr/local/lib/python3.7/dist-packages (from importlib-metadata<5,>=0.23->argcomplete<3->azureml-core~=1.41.0->azureml.opendatasets) (4.2.0)
Requirement already satisfied: zipp>=0.5 in /usr/local/lib/python3.7/dist-packages (from importlib-metadata<5,>=0.23->argcomplete<3->azureml-core~=1.41.0->azureml.opendatasets) (3.8.0)
Requirement already satisfied: pygments in /usr/local/lib/python3.7/dist-packages (from knack~=0.9.0->azureml-core~=1.41.0->azureml.opendatasets) (2.6.1)
Requirement already satisfied: pyyaml in /usr/local/lib/python3.7/dist-packages (from knack~=0.9.0->azureml-core~=1.41.0->azureml.opendatasets) (3.13)
Requirement already satisfied: tabulate in /usr/local/lib/python3.7/dist-packages (from knack~=0.9.0->azureml-core~=1.41.0->azureml.opendatasets) (0.8.9)
Collecting portalocker<3,>=1.0
Downloading portalocker-2.4.0-py2.py3-none-any.whl (16 kB)
Collecting isodate>=0.6.0
Downloading isodate-0.6.1-py2.py3-none-any.whl (41 kB)
|████████████████████████████████| 41 kB 546 kB/s
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.7/dist-packages (from msrest<1.0.0,>=0.5.1->azureml-core~=1.41.0->azureml.opendatasets) (2021.10.8)
Requirement already satisfied: requests-oauthlib>=0.5.0 in /usr/local/lib/python3.7/dist-packages (from msrest<1.0.0,>=0.5.1->azureml-core~=1.41.0->azureml.opendatasets) (1.3.1)
Requirement already satisfied: pyasn1>=0.1.1 in /usr/local/lib/python3.7/dist-packages (from ndg-httpsclient<=0.5.1->azureml-core~=1.41.0->azureml.opendatasets) (0.4.8)
Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in /usr/local/lib/python3.7/dist-packages (from packaging<22.0,>=20.0->azureml-core~=1.41.0->azureml.opendatasets) (3.0.8)
Collecting pynacl>=1.0.1
Downloading PyNaCl-1.5.0-cp36-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_24_x86_64.whl (856 kB)
|████████████████████████████████| 856 kB 33.0 MB/s
Collecting bcrypt>=3.1.3
Downloading bcrypt-3.2.2-cp36-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_24_x86_64.whl (62 kB)
|████████████████████████████████| 62 kB 839 kB/s
Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.7/dist-packages (from requests[socks]<3.0.0,>=2.19.1->azureml-core~=1.41.0->azureml.opendatasets) (2.10)
Requirement already satisfied: chardet<4,>=3.0.2 in /usr/local/lib/python3.7/dist-packages (from requests[socks]<3.0.0,>=2.19.1->azureml-core~=1.41.0->azureml.opendatasets) (3.0.4)
Requirement already satisfied: oauthlib>=3.0.0 in /usr/local/lib/python3.7/dist-packages (from requests-oauthlib>=0.5.0->msrest<1.0.0,>=0.5.1->azureml-core~=1.41.0->azureml.opendatasets) (3.2.0)
Requirement already satisfied: PySocks!=1.5.7,>=1.5.6 in /usr/local/lib/python3.7/dist-packages (from requests[socks]<3.0.0,>=2.19.1->azureml-core~=1.41.0->azureml.opendatasets) (1.7.1)
Collecting jeepney>=0.6
Downloading jeepney-0.8.0-py3-none-any.whl (48 kB)
|████████████████████████████████| 48 kB 5.1 MB/s
Collecting backports.weakref
Downloading backports.weakref-1.0.post1-py2.py3-none-any.whl (5.2 kB)
Collecting py4j==0.10.9.3
Downloading py4j-0.10.9.3-py2.py3-none-any.whl (198 kB)
|████████████████████████████████| 198 kB 38.1 MB/s
Building wheels for collected packages: fusepy, pyspark
Building wheel for fusepy (setup.py) ... done
Created wheel for fusepy: filename=fusepy-3.0.1-py3-none-any.whl size=10503 sha256=7463de2d1f4223b30c993ead050a4ed44d2b15f415a0331c4ca83a8f2ece4f81
Stored in directory: /root/.cache/pip/wheels/89/07/84/a5ebfafeefbbc56ceda9d6935a54a8be7a4eccf4ea7e9bf980
Building wheel for pyspark (setup.py) ... done
Created wheel for pyspark: filename=pyspark-3.2.1-py2.py3-none-any.whl size=281853642 sha256=8bccaaf03f226092e9fe9785a35b1c957a3cc9d060799276f3cfb52cf0337972
Stored in directory: /root/.cache/pip/wheels/9f/f5/07/7cd8017084dce4e93e84e92efd1e1d5334db05f2e83bcef74f
Successfully built fusepy pyspark
Installing collected packages: PyJWT, cryptography, portalocker, msal, isodate, msrest, msal-extensions, distro, azure-core, adal, websocket-client, pyopenssl, pynacl, msrestazure, jmespath, jeepney, dotnetcore2, bcrypt, backports.weakref, azureml-dataprep-rslex, azureml-dataprep-native, azure-mgmt-core, azure-identity, azure-common, argcomplete, SecretStorage, pyarrow, pkginfo, pathspec, paramiko, ndg-httpsclient, knack, jsonpickle, humanfriendly, docker, backports.tempfile, azureml-dataprep, azure-mgmt-storage, azure-mgmt-resource, azure-mgmt-keyvault, azure-mgmt-containerregistry, azure-mgmt-authorization, azure-graphrbac, py4j, fusepy, azureml-dataset-runtime, azureml-core, applicationinsights, pyspark, azureml-telemetry, azureml.opendatasets
Attempting uninstall: pyarrow
Found existing installation: pyarrow 6.0.1
Uninstalling pyarrow-6.0.1:
Successfully uninstalled pyarrow-6.0.1
Successfully installed PyJWT-2.4.0 SecretStorage-3.3.2 adal-1.2.7 applicationinsights-0.11.10 argcomplete-2.0.0 azure-common-1.1.28 azure-core-1.22.1 azure-graphrbac-0.61.1 azure-identity-1.7.0 azure-mgmt-authorization-2.0.0 azure-mgmt-containerregistry-9.1.0 azure-mgmt-core-1.3.0 azure-mgmt-keyvault-9.3.0 azure-mgmt-resource-20.1.0 azure-mgmt-storage-19.1.0 azureml-core-1.41.0.post3 azureml-dataprep-3.1.3 azureml-dataprep-native-38.0.0 azureml-dataprep-rslex-2.5.4 azureml-dataset-runtime-1.41.0 azureml-telemetry-1.41.0 azureml.opendatasets-1.41.0 backports.tempfile-1.0 backports.weakref-1.0.post1 bcrypt-3.2.2 cryptography-36.0.2 distro-1.7.0 docker-5.0.3 dotnetcore2-2.1.23 fusepy-3.0.1 humanfriendly-10.0 isodate-0.6.1 jeepney-0.8.0 jmespath-0.10.0 jsonpickle-2.2.0 knack-0.9.0 msal-1.17.0 msal-extensions-0.3.1 msrest-0.6.21 msrestazure-0.6.4 ndg-httpsclient-0.5.1 paramiko-2.10.4 pathspec-0.9.0 pkginfo-1.8.2 portalocker-2.4.0 py4j-0.10.9.3 pyarrow-3.0.0 pynacl-1.5.0 pyopenssl-22.0.0 pyspark-3.2.1 websocket-client-1.3.2
!pip install azureml-dataset-runtime
Requirement already satisfied: azureml-dataset-runtime in /usr/local/lib/python3.7/dist-packages (1.41.0) Requirement already satisfied: azureml-dataprep<3.2.0a,>=3.1.0a in /usr/local/lib/python3.7/dist-packages (from azureml-dataset-runtime) (3.1.3) Requirement already satisfied: pyarrow<4.0.0,>=0.17.0 in /usr/local/lib/python3.7/dist-packages (from azureml-dataset-runtime) (3.0.0) Requirement already satisfied: numpy!=1.19.3 in /usr/local/lib/python3.7/dist-packages (from azureml-dataset-runtime) (1.21.6) Requirement already satisfied: dotnetcore2<3.0.0,>=2.1.14 in /usr/local/lib/python3.7/dist-packages (from azureml-dataprep<3.2.0a,>=3.1.0a->azureml-dataset-runtime) (2.1.23) Requirement already satisfied: azureml-dataprep-native<39.0.0,>=38.0.0 in /usr/local/lib/python3.7/dist-packages (from azureml-dataprep<3.2.0a,>=3.1.0a->azureml-dataset-runtime) (38.0.0) Requirement already satisfied: cloudpickle<3.0.0,>=1.1.0 in /usr/local/lib/python3.7/dist-packages (from azureml-dataprep<3.2.0a,>=3.1.0a->azureml-dataset-runtime) (1.3.0) Requirement already satisfied: azureml-dataprep-rslex~=2.5.0dev0 in /usr/local/lib/python3.7/dist-packages (from azureml-dataprep<3.2.0a,>=3.1.0a->azureml-dataset-runtime) (2.5.4) Requirement already satisfied: azure-identity==1.7.0 in /usr/local/lib/python3.7/dist-packages (from azureml-dataprep<3.2.0a,>=3.1.0a->azureml-dataset-runtime) (1.7.0) Requirement already satisfied: six>=1.12.0 in /usr/local/lib/python3.7/dist-packages (from azure-identity==1.7.0->azureml-dataprep<3.2.0a,>=3.1.0a->azureml-dataset-runtime) (1.15.0) Requirement already satisfied: cryptography>=2.5 in /usr/local/lib/python3.7/dist-packages (from azure-identity==1.7.0->azureml-dataprep<3.2.0a,>=3.1.0a->azureml-dataset-runtime) (36.0.2) Requirement already satisfied: msal<2.0.0,>=1.12.0 in /usr/local/lib/python3.7/dist-packages (from azure-identity==1.7.0->azureml-dataprep<3.2.0a,>=3.1.0a->azureml-dataset-runtime) (1.17.0) Requirement already satisfied: azure-core<2.0.0,>=1.11.0 in /usr/local/lib/python3.7/dist-packages (from azure-identity==1.7.0->azureml-dataprep<3.2.0a,>=3.1.0a->azureml-dataset-runtime) (1.22.1) Requirement already satisfied: msal-extensions~=0.3.0 in /usr/local/lib/python3.7/dist-packages (from azure-identity==1.7.0->azureml-dataprep<3.2.0a,>=3.1.0a->azureml-dataset-runtime) (0.3.1) Requirement already satisfied: requests>=2.18.4 in /usr/local/lib/python3.7/dist-packages (from azure-core<2.0.0,>=1.11.0->azure-identity==1.7.0->azureml-dataprep<3.2.0a,>=3.1.0a->azureml-dataset-runtime) (2.23.0) Requirement already satisfied: cffi>=1.12 in /usr/local/lib/python3.7/dist-packages (from cryptography>=2.5->azure-identity==1.7.0->azureml-dataprep<3.2.0a,>=3.1.0a->azureml-dataset-runtime) (1.15.0) Requirement already satisfied: pycparser in /usr/local/lib/python3.7/dist-packages (from cffi>=1.12->cryptography>=2.5->azure-identity==1.7.0->azureml-dataprep<3.2.0a,>=3.1.0a->azureml-dataset-runtime) (2.21) Requirement already satisfied: distro>=1.2.0 in /usr/local/lib/python3.7/dist-packages (from dotnetcore2<3.0.0,>=2.1.14->azureml-dataprep<3.2.0a,>=3.1.0a->azureml-dataset-runtime) (1.7.0) Requirement already satisfied: PyJWT[crypto]<3,>=1.0.0 in /usr/local/lib/python3.7/dist-packages (from msal<2.0.0,>=1.12.0->azure-identity==1.7.0->azureml-dataprep<3.2.0a,>=3.1.0a->azureml-dataset-runtime) (2.4.0) Requirement already satisfied: portalocker<3,>=1.0 in /usr/local/lib/python3.7/dist-packages (from msal-extensions~=0.3.0->azure-identity==1.7.0->azureml-dataprep<3.2.0a,>=3.1.0a->azureml-dataset-runtime) (2.4.0) Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.7/dist-packages (from requests>=2.18.4->azure-core<2.0.0,>=1.11.0->azure-identity==1.7.0->azureml-dataprep<3.2.0a,>=3.1.0a->azureml-dataset-runtime) (2021.10.8) Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /usr/local/lib/python3.7/dist-packages (from requests>=2.18.4->azure-core<2.0.0,>=1.11.0->azure-identity==1.7.0->azureml-dataprep<3.2.0a,>=3.1.0a->azureml-dataset-runtime) (1.24.3) Requirement already satisfied: chardet<4,>=3.0.2 in /usr/local/lib/python3.7/dist-packages (from requests>=2.18.4->azure-core<2.0.0,>=1.11.0->azure-identity==1.7.0->azureml-dataprep<3.2.0a,>=3.1.0a->azureml-dataset-runtime) (3.0.4) Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.7/dist-packages (from requests>=2.18.4->azure-core<2.0.0,>=1.11.0->azure-identity==1.7.0->azureml-dataprep<3.2.0a,>=3.1.0a->azureml-dataset-runtime) (2.10)
from azureml.opendatasets import NycSafety
The New York Department oversees arterial and residential streets in New York City, receiving reports through the 311 call center - and uses a mapping and tracking system to identify incident locations and schedule crews. One call to 311 can generate multiple repairs. Weather conditions, frigid temps, and preciptation influence how long a repair takes. One days when the weather is cooperative and there's no precipitation, crews can fill several thousand potholes.
The New York Department oversees approximately X street lights that illuminate arterial and residential streets in New York City; and performs repairs and bulb replacements in response to resident's reports of street light outages. Whenever the CDOT receives a report of an "All Out" the electrician assigned to make the repair looks at the lights in that circuit (each circuit has 8-16 lights) to make sure they're working properly.
This data is updated daily.
#" This is a package in preview.
from datetime import datetime
from dateutil import parser
end_date = parser.parse('2016-01-01')
start_date = parser.parse('2015-05-01')
safety_table = NycSafety(start_date=start_date, end_date=end_date)
nyc_safety = safety_table.to_pandas_dataframe()
/usr/local/lib/python3.7/dist-packages/azureml/opendatasets/dataaccess/_blob_accessor.py:520: Warning: Please install azureml-dataset-runtimeusing pip install azureml-dataset-runtime "Please install azureml-dataset-runtime" + "using pip install azureml-dataset-runtime", Warning)
[Info] read from /tmp/tmpdo5oijjj/https%3A/%2Fazureopendatastorage.azurefd.net/citydatacontainer/Safety/Release/city=NewYorkCity/part-00000-tid-7635389979391348899-42387111-75db-4000-84ca-6158481505d9-13418-3.c000.snappy.parquet [Info] read from /tmp/tmpdo5oijjj/https%3A/%2Fazureopendatastorage.azurefd.net/citydatacontainer/Safety/Release/city=NewYorkCity/part-00001-tid-7635389979391348899-42387111-75db-4000-84ca-6158481505d9-13419-3.c000.snappy.parquet [Info] read from /tmp/tmpdo5oijjj/https%3A/%2Fazureopendatastorage.azurefd.net/citydatacontainer/Safety/Release/city=NewYorkCity/part-00002-tid-7635389979391348899-42387111-75db-4000-84ca-6158481505d9-13420-3.c000.snappy.parquet [Info] read from /tmp/tmpdo5oijjj/https%3A/%2Fazureopendatastorage.azurefd.net/citydatacontainer/Safety/Release/city=NewYorkCity/part-00003-tid-7635389979391348899-42387111-75db-4000-84ca-6158481505d9-13421-3.c000.snappy.parquet [Info] read from /tmp/tmpdo5oijjj/https%3A/%2Fazureopendatastorage.azurefd.net/citydatacontainer/Safety/Release/city=NewYorkCity/part-00004-tid-7635389979391348899-42387111-75db-4000-84ca-6158481505d9-13422-3.c000.snappy.parquet [Info] read from /tmp/tmpdo5oijjj/https%3A/%2Fazureopendatastorage.azurefd.net/citydatacontainer/Safety/Release/city=NewYorkCity/part-00005-tid-7635389979391348899-42387111-75db-4000-84ca-6158481505d9-13423-3.c000.snappy.parquet [Info] read from /tmp/tmpdo5oijjj/https%3A/%2Fazureopendatastorage.azurefd.net/citydatacontainer/Safety/Release/city=NewYorkCity/part-00006-tid-7635389979391348899-42387111-75db-4000-84ca-6158481505d9-13424-3.c000.snappy.parquet [Info] read from /tmp/tmpdo5oijjj/https%3A/%2Fazureopendatastorage.azurefd.net/citydatacontainer/Safety/Release/city=NewYorkCity/part-00007-tid-7635389979391348899-42387111-75db-4000-84ca-6158481505d9-13425-3.c000.snappy.parquet
nyc_safety
| dataType | dataSubtype | dateTime | category | subcategory | status | address | latitude | longitude | source | extendedProperties | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 15 | Safety | 311_All | 2015-06-26 17:39:04 | Noise - Street/Sidewalk | Loud Music/Party | Closed | None | 40.735370 | -73.989969 | None | |
| 28 | Safety | 311_All | 2015-11-04 11:54:00 | Water System | Leak (Use Comments) (WA2) | Closed | None | 40.705333 | -73.959525 | None | |
| 74 | Safety | 311_All | 2015-06-03 11:49:22 | UNSANITARY CONDITION | PESTS | Closed | 1250 LELAND AVENUE | 40.831518 | -73.863392 | None | |
| 75 | Safety | 311_All | 2015-09-21 20:18:16 | Illegal Parking | Blocked Hydrant | Closed | 127 BAY 13 STREET | 40.607025 | -74.009133 | None | |
| 98 | Safety | 311_All | 2015-07-24 18:25:17 | Illegal Parking | Double Parked Blocking Vehicle | Closed | 335 BOWERY | 40.726047 | -73.991908 | None | |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 3550850 | Safety | 311_All | 2015-05-01 11:27:53 | HPD Literature Request | The ABCs of Housing | Closed | None | NaN | NaN | None | |
| 3550853 | Safety | 311_All | 2015-08-08 21:17:00 | Street Light Condition | Street Light Out | Closed | None | 40.692653 | -73.754600 | None | |
| 3550894 | Safety | 311_All | 2015-07-06 17:39:37 | Damaged Tree | Branch Cracked and Will Fall | Closed | 383 EAST 198 STREET | 40.866433 | -73.886300 | None | |
| 3550899 | Safety | 311_All | 2015-05-09 22:16:15 | Noise - Commercial | Loud Music/Party | Closed | 1127 PRESIDENT STREET | 40.668199 | -73.952818 | None | |
| 3550903 | Safety | 311_All | 2015-12-05 08:20:00 | Noise | Noise: Construction Before/After Hours (NM1) | Closed | 33 13 STREET | 40.670611 | -73.996226 | None |
1522335 rows × 11 columns
nyc_safety.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 1522335 entries, 15 to 3550903 Data columns (total 11 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 dataType 1522335 non-null object 1 dataSubtype 1522335 non-null object 2 dateTime 1522335 non-null datetime64[ns] 3 category 1522335 non-null object 4 subcategory 1522256 non-null object 5 status 1522335 non-null object 6 address 1220051 non-null object 7 latitude 1377919 non-null float64 8 longitude 1377919 non-null float64 9 source 0 non-null object 10 extendedProperties 1522335 non-null object dtypes: datetime64[ns](1), float64(2), object(8) memory usage: 139.4+ MB
import pandas as pd
import folium
import requests
import pandas
nyc_safety['latitude'] = nyc_safety['latitude'].fillna(0)
nyc_safety['longitude'] = nyc_safety['longitude'].fillna(0)
map_osm = folium.Map(location=[40.705920, -73.921794], zoom_start=13)
map_osm
for column in nyc_safety.columns:
print(column)
print(nyc_safety[column].unique())
print("\n")
dataType ['Safety'] dataSubtype ['311_All'] dateTime ['2015-06-26T17:39:04.000000000' '2015-11-04T11:54:00.000000000' '2015-06-03T11:49:22.000000000' ... '2015-07-06T17:39:37.000000000' '2015-05-09T22:16:15.000000000' '2015-12-05T08:20:00.000000000'] category ['Noise - Street/Sidewalk' 'Water System' 'UNSANITARY CONDITION' 'Illegal Parking' 'Noise - Residential' 'Street Condition' 'DOOR/WINDOW' 'Maintenance or Facility' 'PAINT/PLASTER' 'Noise - Commercial' 'FLOORING/STAIRS' 'Housing - Low Income Senior' 'Traffic Signal Condition' 'Noise - Vehicle' 'Broken Muni Meter' 'Overflowing Litter Baskets' 'Rodent' 'WATER LEAK' 'HEAT/HOT WATER' 'Graffiti' 'DOF Property - Reduction Issue' 'Other Enforcement' 'Derelict Vehicle' 'Blocked Driveway' 'Non-Residential Heat' 'Consumer Complaint' 'ELECTRIC' 'Sewer' 'Street Light Condition' 'Noise' 'GENERAL' 'Asbestos' 'DOF Parking - Tax Exemption' 'Damaged Tree' 'PLUMBING' 'Miscellaneous Categories' 'DOF Property - Owner Issue' 'New Tree Request' 'Electrical' 'Street Sign - Missing' 'General Construction/Plumbing' 'Root/Sewer/Sidewalk Condition' 'Traffic' 'SAFETY' 'Sanitation Condition' 'Housing Options' 'Building/Use' 'Animal Abuse' 'Indoor Air Quality' 'Special Projects Inspection Team (SPIT)' 'Food Establishment' 'Noise - Park' 'Drinking' 'Noise - Helicopter' 'Taxi Complaint' 'Sidewalk Condition' 'Construction Safety Enforcement' 'Construction' 'Overgrown Tree/Branches' 'Dirty Conditions' 'Unsanitary Animal Pvt Property' 'Violation of Park Rules' 'Homeless Person Assistance' 'Missed Collection (All Materials)' 'Benefit Card Replacement' 'Emergency Response Team (ERT)' 'HPD Literature Request' 'Bus Stop Shelter Placement' 'Street Sign - Damaged' 'Homeless Encampment' 'Curb Condition' 'Standing Water' 'DOF Parking - Payment Issue' 'Derelict Vehicles' 'Food Poisoning' 'Taxi Report' 'SCRIE' 'Dead Tree' 'Plumbing' 'APPLIANCE' 'DOF Property - Payment Issue' 'Lead' 'For Hire Vehicle Complaint' 'Non-Emergency Police Matter' 'DOF Property - Update Account' 'DOF Property - RPIE Issue' 'Illegal Tree Damage' 'Harboring Bees/Wasps' 'BEST/Site Safety' 'Air Quality' 'Day Care' 'DPR Internal' 'Illegal Animal Kept as Pet' 'Water Conservation' 'Elevator' 'Vending' 'Mold' 'Posting Advertisement' 'Quality of Life' 'Illegal Animal Sold' 'Industrial Waste' 'City Vehicle Placard Complaint' 'Hazardous Materials' 'DOF Property - Request Copy' 'Home Delivered Meal - Missed Delivery' 'Sweeping/Missed' 'For Hire Vehicle Report' 'DOF Parking - DMV Clearance' 'School Maintenance' 'Litter Basket / Request' 'Highway Condition' 'Street Sign - Dangling' 'OUTSIDE BUILDING' 'Found Property' 'Foam Ban Enforcement' 'Animal in a Park' 'Poison Ivy' 'Building Marshals office' 'Collection Truck Noise' 'Smoking' 'Recycling Enforcement' 'Disorderly Youth' 'Indoor Sewage' 'Investigations and Discipline (IAD)' 'Derelict Bicycle' 'Plant' 'Vacant Lot' 'Sweeping/Inadequate' 'X-Ray Machine/Equipment' 'Public Payphone Complaint' 'Facades' 'Water Quality' 'Noise - House of Worship' 'DOF Parking - Request Status' 'Broken Parking Meter' 'Taxi Compliment' 'Cranes and Derricks' 'Overflowing Recycling Baskets' 'ATF' 'ELEVATOR' 'Senior Center Complaint' 'Bike/Roller/Skate Chronic' 'Elder Abuse' 'Unsanitary Pigeon Condition' 'Bus Stop Shelter Complaint' 'OEM Literature Request' 'Unleashed Dog' 'Bereavement Support Group' 'Ferry Inquiry' 'Boilers' 'Illegal Fireworks' 'DCA / DOH New License Application Request' 'Urinating in Public' 'Scaffold Safety' 'Beach/Pool/Sauna Complaint' "Alzheimer's Care" 'Bridge Condition' 'DOF Parking - Request Copy' 'Public Toilet' 'Drinking Water' 'Taxpayer Advocate Inquiry' 'Advocate - Other' 'Ferry Complaint' 'VACANT APARTMENT' 'Highway Sign - Dangling' 'Window Guard' 'Highway Sign - Damaged' 'Panhandling' 'Home Delivered Meal Complaint' 'Municipal Parking Facility' 'Animal Facility - No Permit' 'Bike Rack Condition' 'Unsanitary Animal Facility' 'Adopt-A-Basket' 'Home Care Provider Complaint' 'Snow' 'FATF' 'Lifeguard' 'Parking Card' 'Special Natural Area District (SNAD)' 'Advocate-Personal Exemptions' 'Highway Sign - Missing' 'DHS Income Savings Requirement' 'DOF Property - Property Value' 'Tattooing' 'Case Management Agency Complaint' 'Stalled Sites' 'SRDE' 'Forensic Engineering' 'Ferry Permit' 'NORC Complaint' 'Calorie Labeling' 'Advocate-SCRIE/DRIE' 'Transportation Provider Complaint' 'Legal Services Provider Complaint' 'DOF Property - City Rebate' 'Bottled Water' 'Radioactive Material' 'Interior Demo' 'Tunnel Condition' 'Advocate-Prop Class Incorrect' 'Building Condition' 'Squeegee' 'AGENCY' 'Tanning' 'Advocate - RPIE' 'Advocate-Commercial Exemptions'] subcategory ['Loud Music/Party' 'Leak (Use Comments) (WA2)' 'PESTS' ... 'Dead End Sign' 'Woodside Settlement Project' 'Pedicab Driver'] status ['Closed' 'Pending' 'Open' 'Assigned' 'Started' 'Draft' 'Unspecified'] address [None '1250 LELAND AVENUE' '127 BAY 13 STREET' ... '679 ROCKAWAY TURNPIKE' '44-0-44-98 CRESCENT STREET' '60 EAST 7 STREET'] latitude [40.73537039 40.70533331 40.83151754 ... 40.62085745 40.76977938 40.69657653] longitude [-73.98996874 -73.95952516 -73.86339214 ... -74.13178925 -73.89080854 -73.93870428] source [None] extendedProperties ['']
Examine the spread of four types of categories of incidents
for index, value in nyc_safety.head(10000).iterrows():
if value.category == 'Noise - Residential':
folium.Marker(location=[value["latitude"], value["longitude"]],
icon=folium.Icon(color='green'), popup='Noise - Residential').add_to(map_osm)
elif value.category == 'Damaged Tree':
folium.Marker(location=[value["latitude"], value["longitude"]],
icon=folium.Icon(color='purple'), popup='Damaged Tree').add_to(map_osm)
elif value.category == 'Bike Rack Condition':
folium.Marker(location=[value["latitude"], value["longitude"]],
icon=folium.Icon(color='yellow'), popup='Bike Rack Condition').add_to(map_osm)
elif value.category == 'Curb Condition':
folium.Marker(location=[value["latitude"], value["longitude"]],
icon=folium.Icon(color='black'), popup='Curb Condition').add_to(map_osm)
map_osm
alt_map = folium.Map(location=[40.705920, -73.921794], tiles="Stamen Terrain", zoom_start=12.5)
alt_map
for index, value in nyc_safety.head(10000).iterrows():
if value.category == 'WATER LEAK':
if value.subcategory == 'HEAVY FLOW':
folium.Marker(location=[value["latitude"], value["longitude"]],
icon=folium.Icon(color='green'), popup="Heavy Flow").add_to(alt_map)
elif value.subcategory == 'SLOW LEAK':
folium.Marker(location=[value["latitude"], value["longitude"]],
icon=folium.Icon(color='purple'), popup="Slow Leak").add_to(alt_map)
elif value.subcategory == 'DAMP SPOT':
folium.Marker(location=[value["latitude"], value["longitude"]],
icon=folium.Icon(color='yellow'), popup="Damp Spot").add_to(alt_map)
alt_map
Four more categories on an alternate map
for index, value in nyc_safety.head(10000).iterrows():
if value.category == 'PLUMBING':
if value.subcategory == 'TOILET':
folium.Marker(location=[value["latitude"], value["longitude"]],
icon=folium.Icon(color='green'), popup="toilet").add_to(alt_map)
elif value.subcategory == 'BATHTUB/SHOWER':
folium.Marker(location=[value["latitude"], value["longitude"]],
icon=folium.Icon(color='purple'), popup="Bathtub/Shower").add_to(alt_map)
elif value.subcategory == 'BASIN/SINK':
folium.Marker(location=[value["latitude"], value["longitude"]],
icon=folium.Icon(color='red'), popup="Basin/Sink").add_to(alt_map)
alt_map
for index, value in nyc_safety.head(10000).iterrows():
if value.category == 'Street Condition':
if value.subcategory == 'Pothole':
folium.Marker(location=[value["latitude"], value["longitude"]],
icon=folium.Icon(color='green'), popup="pothole").add_to(alt_map)
elif value.subcategory == 'Defective Hardware':
folium.Marker(location=[value["latitude"], value["longitude"]],
icon=folium.Icon(color='purple'), popup="defective hardware").add_to(alt_map)
elif value.subcategory == 'Rough, Pitted or Cracked Roads':
folium.Marker(location=[value["latitude"], value["longitude"]],
icon=folium.Icon(color='red'), popup="rough, pitted or cracked roads").add_to(alt_map)
elif value.subcategory == 'Cave-in':
folium.Marker(location=[value["latitude"], value["longitude"]],
icon=folium.Icon(color='black'), popup="cave-in").add_to(alt_map)
elif value.subcategory == 'Line/Marking - Faded':
folium.Marker(location=[value["latitude"], value["longitude"]],
icon=folium.Icon(color='white'), popup="line/marking - faded").add_to(alt_map)
alt_map
!pip install geopandas
Collecting geopandas
Downloading geopandas-0.10.2-py2.py3-none-any.whl (1.0 MB)
|████████████████████████████████| 1.0 MB 5.2 MB/s
Requirement already satisfied: shapely>=1.6 in /usr/local/lib/python3.7/dist-packages (from geopandas) (1.8.1.post1)
Requirement already satisfied: pandas>=0.25.0 in /usr/local/lib/python3.7/dist-packages (from geopandas) (1.3.5)
Collecting fiona>=1.8
Downloading Fiona-1.8.21-cp37-cp37m-manylinux2014_x86_64.whl (16.7 MB)
|████████████████████████████████| 16.7 MB 39.2 MB/s
Collecting pyproj>=2.2.0
Downloading pyproj-3.2.1-cp37-cp37m-manylinux2010_x86_64.whl (6.3 MB)
|████████████████████████████████| 6.3 MB 38.6 MB/s
Requirement already satisfied: click>=4.0 in /usr/local/lib/python3.7/dist-packages (from fiona>=1.8->geopandas) (7.1.2)
Collecting munch
Downloading munch-2.5.0-py2.py3-none-any.whl (10 kB)
Collecting cligj>=0.5
Downloading cligj-0.7.2-py3-none-any.whl (7.1 kB)
Requirement already satisfied: six>=1.7 in /usr/local/lib/python3.7/dist-packages (from fiona>=1.8->geopandas) (1.15.0)
Requirement already satisfied: setuptools in /usr/local/lib/python3.7/dist-packages (from fiona>=1.8->geopandas) (57.4.0)
Requirement already satisfied: attrs>=17 in /usr/local/lib/python3.7/dist-packages (from fiona>=1.8->geopandas) (21.4.0)
Requirement already satisfied: certifi in /usr/local/lib/python3.7/dist-packages (from fiona>=1.8->geopandas) (2021.10.8)
Collecting click-plugins>=1.0
Downloading click_plugins-1.1.1-py2.py3-none-any.whl (7.5 kB)
Requirement already satisfied: numpy>=1.17.3 in /usr/local/lib/python3.7/dist-packages (from pandas>=0.25.0->geopandas) (1.21.6)
Requirement already satisfied: python-dateutil>=2.7.3 in /usr/local/lib/python3.7/dist-packages (from pandas>=0.25.0->geopandas) (2.8.2)
Requirement already satisfied: pytz>=2017.3 in /usr/local/lib/python3.7/dist-packages (from pandas>=0.25.0->geopandas) (2022.1)
Installing collected packages: munch, cligj, click-plugins, pyproj, fiona, geopandas
Successfully installed click-plugins-1.1.1 cligj-0.7.2 fiona-1.8.21 geopandas-0.10.2 munch-2.5.0 pyproj-3.2.1
import geopandas as gpd
import folium
import matplotlib.pyplot as plt
path = gpd.datasets.get_path('nybb')
df = gpd.read_file(path)
df
| BoroCode | BoroName | Shape_Leng | Shape_Area | geometry | |
|---|---|---|---|---|---|
| 0 | 5 | Staten Island | 330470.010332 | 1.623820e+09 | MULTIPOLYGON (((970217.022 145643.332, 970227.... |
| 1 | 4 | Queens | 896344.047763 | 3.045213e+09 | MULTIPOLYGON (((1029606.077 156073.814, 102957... |
| 2 | 3 | Brooklyn | 741080.523166 | 1.937479e+09 | MULTIPOLYGON (((1021176.479 151374.797, 102100... |
| 3 | 1 | Manhattan | 359299.096471 | 6.364715e+08 | MULTIPOLYGON (((981219.056 188655.316, 980940.... |
| 4 | 2 | Bronx | 464392.991824 | 1.186925e+09 | MULTIPOLYGON (((1012821.806 229228.265, 101278... |
Now let's zone in by creating the boundaries of the five boroughs of New York City.
df.plot(figsize=(6, 6))
plt.show()
df.crs
<Projected CRS: EPSG:2263> Name: NAD83 / New York Long Island (ftUS) Axis Info [cartesian]: - X[east]: Easting (US survey foot) - Y[north]: Northing (US survey foot) Area of Use: - name: United States (USA) - New York - counties of Bronx; Kings; Nassau; New York; Queens; Richmond; Suffolk. - bounds: (-74.26, 40.47, -71.8, 41.3) Coordinate Operation: - name: SPCS83 New York Long Island zone (US Survey feet) - method: Lambert Conic Conformal (2SP) Datum: North American Datum 1983 - Ellipsoid: GRS 1980 - Prime Meridian: Greenwich
df = df.to_crs(epsg=4326)
print(df.crs)
df
epsg:4326
| BoroCode | BoroName | Shape_Leng | Shape_Area | geometry | |
|---|---|---|---|---|---|
| 0 | 5 | Staten Island | 330470.010332 | 1.623820e+09 | MULTIPOLYGON (((-74.05051 40.56642, -74.05047 ... |
| 1 | 4 | Queens | 896344.047763 | 3.045213e+09 | MULTIPOLYGON (((-73.83668 40.59495, -73.83678 ... |
| 2 | 3 | Brooklyn | 741080.523166 | 1.937479e+09 | MULTIPOLYGON (((-73.86706 40.58209, -73.86769 ... |
| 3 | 1 | Manhattan | 359299.096471 | 6.364715e+08 | MULTIPOLYGON (((-74.01093 40.68449, -74.01193 ... |
| 4 | 2 | Bronx | 464392.991824 | 1.186925e+09 | MULTIPOLYGON (((-73.89681 40.79581, -73.89694 ... |
m = folium.Map(location=[40.70, -73.94], zoom_start=10, tiles='CartoDB positron')
for _, r in df.iterrows():
# Without simplifying the representation of each borough,
# the map might not be displayed
sim_geo = gpd.GeoSeries(r['geometry']).simplify(tolerance=0.001)
geo_j = sim_geo.to_json()
geo_j = folium.GeoJson(data=geo_j,
style_function=lambda x: {'fillColor': 'orange'})
folium.Popup(r['BoroName']).add_to(geo_j)
geo_j.add_to(m)
m
Begin to examine a trend in where specific kinds of service requests show up. Specifically, try to predit which borough a service incident will show up in.
from geopy.geocoders import Nominatim
geolocator = Nominatim(user_agent="my-application")
m2 = folium.Map(location=[40.70, -73.94], zoom_start=10, tiles='CartoDB positron')
for _, r in df.iterrows():
# Without simplifying the representation of each borough,
# the map might not be displayed
sim_geo = gpd.GeoSeries(r['geometry']).simplify(tolerance=0.001)
geo_j = sim_geo.to_json()
geo_j = folium.GeoJson(data=geo_j,
style_function=lambda x: {'fillColor': 'orange'})
folium.Popup(r['BoroName']).add_to(geo_j)
geo_j.add_to(m2)
m2
import numpy as np
df1 = pd.DataFrame([ ['Staten Island', 0,0,0,0] , ['Queens',0,0,0,0] , ['Brooklyn', 0,0,0,0] , ['Manhattan',0,0,0,0] , ['The Bronx', 0,0,0,0] ],
columns=['Borough', 'Loud Music/Party', 'Banging/Pounding', 'Loud Talking', 'Loud Television'])
df1
| Borough | Loud Music/Party | Banging/Pounding | Loud Talking | Loud Television | |
|---|---|---|---|---|---|
| 0 | Staten Island | 0 | 0 | 0 | 0 |
| 1 | Queens | 0 | 0 | 0 | 0 |
| 2 | Brooklyn | 0 | 0 | 0 | 0 |
| 3 | Manhattan | 0 | 0 | 0 | 0 |
| 4 | The Bronx | 0 | 0 | 0 | 0 |
def funct(event):
if loc.find("Staten Island") != -1:
df1.at[0, event]+=1
folium.Marker(location=[value["latitude"], value["longitude"]],
icon=folium.Icon(color='purple'), popup=event + '@Staten Island').add_to(m)
elif loc.find("Queens") != -1:
df1.at[1, event]+=1
folium.Marker(location=[value["latitude"], value["longitude"]],
icon=folium.Icon(color='blue'), popup=event + '@Queens').add_to(m)
elif loc.find("Brooklyn") != -1:
df1.at[2, event]+=1
folium.Marker(location=[value["latitude"], value["longitude"]],
icon=folium.Icon(color='red'), popup=event + '@Brooklyn').add_to(m)])
elif loc.find("Manhattan") != -1:
df1.at[3, event]+=1
folium.Marker(location=[value["latitude"], value["longitude"]],
icon=folium.Icon(color='yellow'), popup=event + '@Manhattan').add_to(m)
elif loc.find("Bronx") != -1:
df1.at[4, event]+=1
folium.Marker(location=[value["latitude"], value["longitude"]],
icon=folium.Icon(color='green'), popup=event + '@Bronx').add_to(m)
for index, value in nyc_safety.head(1000).iterrows():
if value.category == 'Noise - Residential':
location = geolocator.reverse( str(value["latitude"]) + ", " + str(value["longitude"]) )
loc = str(location)
if value.subcategory == 'Loud Music/Party':
funct('Loud Music/Party')
elif value.subcategory == 'Banging/Pounding':
funct('Banging/Pounding')
elif value.subcategory == 'Loud Talking':
funct('Loud Talking')
elif value.subcategory == 'Loud Television':
funct('Loud Television')
The database reveals the answer by number of occurences
df1
| Borough | Loud Music/Party | Banging/Pounding | Loud Talking | Loud Television | |
|---|---|---|---|---|---|
| 0 | Staten Island | 4 | 1 | 1 | 0 |
| 1 | Queens | 17 | 5 | 0 | 1 |
| 2 | Brooklyn | 17 | 6 | 4 | 2 |
| 3 | Manhattan | 16 | 3 | 2 | 0 |
| 4 | The Bronx | 14 | 6 | 1 | 0 |
df1.plot.bar(x='Borough', rot=90)
plt.xticks(rotation='horizontal')
plt.margins(0.7)
plt.subplots_adjust(bottom=0.15)
plt.show()
m
df2 = pd.DataFrame(
[ ['Staten Island', 0,0,0,0] , ['Queens',0,0,0,0] , ['Brooklyn', 0,0,0,0] , ['Manhattan',0,0,0,0] , ['The Bronx', 0,0,0,0] ],
columns=['Borough', 'BATHTUB/SHOWER', 'BASIN/SINK', 'BOILER', 'TOILET'])
#columns=['Manhattan', 'Brooklyn', 'Queens', 'Staten Island', 'The Bronx'])
df2
| Borough | BATHTUB/SHOWER | BASIN/SINK | BOILER | TOILET | |
|---|---|---|---|---|---|
| 0 | Staten Island | 0 | 0 | 0 | 0 |
| 1 | Queens | 0 | 0 | 0 | 0 |
| 2 | Brooklyn | 0 | 0 | 0 | 0 |
| 3 | Manhattan | 0 | 0 | 0 | 0 |
| 4 | The Bronx | 0 | 0 | 0 | 0 |
def funct2(event):
if loc.find("Staten Island") != -1:
df2.at[0, event]+=1
folium.Marker(location=[value["latitude"], value["longitude"]],
icon=folium.Icon(color='purple'), popup=event + '@Staten Island').add_to(m2)
elif loc.find("Queens") != -1:
df2.at[1, event]+=1
folium.Marker(location=[value["latitude"], value["longitude"]],
icon=folium.Icon(color='blue'), popup=event + '@Queens').add_to(m2)
elif loc.find("Brooklyn") != -1:
df2.at[2, event]+=1
folium.Marker(location=[value["latitude"], value["longitude"]],
icon=folium.Icon(color='red'), popup=event + '@Brooklyn').add_to(m2)
elif loc.find("Manhattan") != -1:
df2.at[3, event]+=1
folium.Marker(location=[value["latitude"], value["longitude"]],
icon=folium.Icon(color='orange'), popup=event + '@Manhattan').add_to(m2)
elif loc.find("Bronx") != -1:
df2.at[4, event]+=1
folium.Marker(location=[value["latitude"], value["longitude"]],
icon=folium.Icon(color='green'), popup=event + '@Bronx').add_to(m2)
for index, value in nyc_safety.head(1000).iterrows():
if value.category == 'PLUMBING':
location= geolocator.reverse( str(value["latitude"]) + ", " + str(value["longitude"]) )
loc = str(location)
if value.subcategory == 'BATHTUB/SHOWER':
funct2('BATHTUB/SHOWER')
elif value.subcategory == 'BASIN/SINK':
funct2('BASIN/SINK')
elif value.subcategory == 'BOILER':
funct2('BOILER')
elif value.subcategory == 'TOILET':
funct2('TOILET')
df2
| Borough | BATHTUB/SHOWER | BASIN/SINK | BOILER | TOILET | |
|---|---|---|---|---|---|
| 0 | Staten Island | 0 | 1 | 0 | 0 |
| 1 | Queens | 1 | 1 | 0 | 0 |
| 2 | Brooklyn | 0 | 5 | 1 | 0 |
| 3 | Manhattan | 0 | 0 | 0 | 0 |
| 4 | The Bronx | 2 | 2 | 0 | 1 |
df2.plot.bar(x='Borough', rot=90)
plt.xticks(rotation='horizontal')
plt.margins(0.7)
plt.subplots_adjust(bottom=0.15)
plt.show()
m2
m3 = folium.Map(location=[40.70, -73.94], zoom_start=10, tiles='CartoDB positron')
for _, r in df.iterrows():
# Without simplifying the representation of each borough,
# the map might not be displayed
sim_geo = gpd.GeoSeries(r['geometry']).simplify(tolerance=0.001)
geo_j = sim_geo.to_json()
geo_j = folium.GeoJson(data=geo_j,
style_function=lambda x: {'fillColor': 'orange'})
folium.Popup(r['BoroName']).add_to(geo_j)
geo_j.add_to(m3)
m3
df3 = pd.DataFrame([ ['Staten Island', 0,0,0,0] , ['Queens',0,0,0,0] , ['Brooklyn', 0,0,0,0] , ['Manhattan',0,0,0,0] , ['The Bronx', 0,0,0,0] ],
columns=['Borough', 'Posted Parking Sign Violation', 'Commercial Overnight Parking', 'Overnight Commercial Storage', 'Double Parked Blocking Traffic'])
df3
| Borough | Posted Parking Sign Violation | Commercial Overnight Parking | Overnight Commercial Storage | Double Parked Blocking Traffic | |
|---|---|---|---|---|---|
| 0 | Staten Island | 0 | 0 | 0 | 0 |
| 1 | Queens | 0 | 0 | 0 | 0 |
| 2 | Brooklyn | 0 | 0 | 0 | 0 |
| 3 | Manhattan | 0 | 0 | 0 | 0 |
| 4 | The Bronx | 0 | 0 | 0 | 0 |
def funct3(event):
if loc.find("Staten Island") != -1:
df3.at[0, event]+=1
folium.Marker(location=[value["latitude"], value["longitude"]],
icon=folium.Icon(color='purple'), popup=event + '@Staten Island').add_to(m3)
elif loc.find("Queens") != -1:
df3.at[1, event]+=1
folium.Marker(location=[value["latitude"], value["longitude"]],
icon=folium.Icon(color='blue'), popup=event + '@Queens').add_to(m3)
elif loc.find("Brooklyn") != -1:
df3.at[2, event]+=1
folium.Marker(location=[value["latitude"], value["longitude"]],
icon=folium.Icon(color='red'), popup=event + '@Brooklyn').add_to(m3)
elif loc.find("Manhattan") != -1:
df3.at[3, event]+=1
folium.Marker(location=[value["latitude"], value["longitude"]],
icon=folium.Icon(color='orange'), popup=event + '@Manhattan').add_to(m3)
elif loc.find("Bronx") != -1:
df3.at[4, event]+=1
folium.Marker(location=[value["latitude"], value["longitude"]],
icon=folium.Icon(color='green'), popup=event + '@Bronx').add_to(m3)
for index, value in nyc_safety.head(1000).iterrows():
if value.category == 'Illegal Parking':
location= geolocator.reverse( str(value["latitude"]) + ", " + str(value["longitude"]) )
loc = str(location)
if value.subcategory == 'Posted Parking Sign Violation':
funct3('Posted Parking Sign Violation')
elif value.subcategory == 'Commercial Overnight Parking':
funct3('Commercial Overnight Parking')
elif value.subcategory == 'Overnight Commercial Storage':
funct3('Overnight Commercial Storage')
elif value.subcategory == 'Double Parked Blocking Traffic':
funct3('Double Parked Blocking Traffic')
df3
| Borough | Posted Parking Sign Violation | Commercial Overnight Parking | Overnight Commercial Storage | Double Parked Blocking Traffic | |
|---|---|---|---|---|---|
| 0 | Staten Island | 0 | 3 | 0 | 0 |
| 1 | Queens | 7 | 0 | 0 | 0 |
| 2 | Brooklyn | 5 | 3 | 1 | 1 |
| 3 | Manhattan | 1 | 1 | 0 | 1 |
| 4 | The Bronx | 2 | 0 | 1 | 1 |
import matplotlib.pyplot as plt
df3.plot.bar(x='Borough', rot=90)
plt.xticks(rotation='horizontal')
plt.margins(0.7)
plt.subplots_adjust(bottom=0.15)
plt.show()
m3
m4 = folium.Map(location=[40.70, -73.94], zoom_start=10, tiles='CartoDB positron')
for _, r in df.iterrows():
# Without simplifying the representation of each borough,
# the map might not be displayed
sim_geo = gpd.GeoSeries(r['geometry']).simplify(tolerance=0.001)
geo_j = sim_geo.to_json()
geo_j = folium.GeoJson(data=geo_j,
style_function=lambda x: {'fillColor': 'orange'})
folium.Popup(r['BoroName']).add_to(geo_j)
geo_j.add_to(m4)
m4
def threshold_func(incident): # uses df1 df2 df3
val = max(df1.at[0, incident], df1.at[1, incident], df1.at[2, incident], df1.at[3, incident], df1.at[4, incident])
ret = df1.loc[df1[incident] == val]['Borough']
return ret.sample(n=1).iloc[0]
m4
resid = nyc_safety[nyc_safety['category'] == 'Noise - Residential']
The borough predictor function is run a subset of data from the original dataframe that contains just one specific kind of service request.
resid
| dataType | dataSubtype | dateTime | category | subcategory | status | address | latitude | longitude | source | extendedProperties | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 125 | Safety | 311_All | 2015-05-29 11:03:10 | Noise - Residential | Loud Music/Party | Closed | 1050 MADISON STREET | 40.690599 | -73.917453 | None | |
| 156 | Safety | 311_All | 2015-05-03 02:27:22 | Noise - Residential | Loud Music/Party | Closed | 62 WILSON AVENUE | 40.702362 | -73.928655 | None | |
| 392 | Safety | 311_All | 2015-09-10 23:20:52 | Noise - Residential | Loud Music/Party | Closed | 3911 AVENUE S | 40.611260 | -73.929022 | None | |
| 540 | Safety | 311_All | 2015-09-08 22:52:44 | Noise - Residential | Loud Music/Party | Closed | 39 EAST 17 STREET | 40.650008 | -73.964154 | None | |
| 625 | Safety | 311_All | 2015-10-25 01:39:06 | Noise - Residential | Loud Music/Party | Closed | 63 ELLWOOD STREET | 40.860580 | -73.928626 | None | |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 3550056 | Safety | 311_All | 2015-07-04 20:04:37 | Noise - Residential | Loud Talking | Closed | 312 WEBSTER AVENUE | 40.633228 | -73.969974 | None | |
| 3550299 | Safety | 311_All | 2015-06-12 12:36:17 | Noise - Residential | Loud Television | Closed | 310 WEST 99 STREET | 40.797338 | -73.972172 | None | |
| 3550557 | Safety | 311_All | 2015-06-27 20:17:56 | Noise - Residential | Loud Music/Party | Closed | 749 LAFAYETTE AVENUE | 40.690804 | -73.943361 | None | |
| 3550574 | Safety | 311_All | 2015-08-08 22:08:13 | Noise - Residential | Loud Music/Party | Closed | None | 40.513024 | -74.250696 | None | |
| 3550643 | Safety | 311_All | 2015-11-08 01:38:56 | Noise - Residential | Loud Music/Party | Closed | 409 WEST 129 STREET | 40.813830 | -73.951927 | None |
148295 rows × 11 columns
def find_actual_borough(lat, long):
locat = geolocator.reverse(str(lat) + ", " + str(long))
loc = str(locat)
if loc.find("Staten Island") != -1:
return "Staten Island"
elif loc.find("Queens") != -1:
return "Queens"
elif loc.find("Brooklyn") != -1:
return "Brooklyn"
elif loc.find("Manhattan") != -1:
return "Manhattan"
elif loc.find("Bronx") != -1:
return "Bronx"
return np.NAN
The estimated borough location of a given service request is compared with the actual borough location and thus an error/accuracy rate is able to be obtained that shows how accurate the model is at predicting the correct borough location.
error_rate = 0
import tqdm
for index, value in resid.head(1000).iterrows():
predicted_borough = threshold_func(value['subcategory'])
correct_borough = find_actual_borough(value["latitude"], value["longitude"])
folium.Marker(location=[value["latitude"], value["longitude"]],
icon=folium.Icon(color='purple'), popup='@' + predicted_borough).add_to(m4)
if predicted_borough != correct_borough:
error_rate+=1
print(error_rate/1000*100)
75.8
accuracy_rate = (1000 - error_rate)/1000 *100
print(accuracy_rate)
#plot the figures
plt.figure(figsize=(14, 8))
plt.subplot(1,2,1)
plt.title("Accuracy of borough predictor as threshold")
plt.bar(range(1), accuracy_rate)
plt.ylim(0, 100)
plt.xlabel("Threshold required")
plt.ylabel("Accuracy Percentage")
24.2
Text(0, 0.5, 'Accuracy Percentage')
#Plotting the error
#plot the figures
plt.figure(figsize=(14, 8))
plt.subplot(1,2,1)
plt.title("Error of borough predictor as threshold")
plt.bar(range(1), error_rate/1000*100)
plt.ylim(0, 100)
plt.xlabel("Threshold required")
plt.ylabel("Error Percentage")
Text(0, 0.5, 'Error Percentage')
Thank you for reading this data science tutorial.